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[57] ABSTRACT 

A character voice communication system including 
high efficiency voice coding system for encoding and 
transmitting speech information at a high efficiency and 
a voice character input/output system for converting 
speech information into character information or re- 
ceiving character information and transmitting speech 
or character information are organically integrated. A 
speech analyzer and a speech synthesizer are shared by 
both the voice coding and the voice character input- 
/output systems. Communication apparatus is also pro- 
vided which allows mutual conversion between speech 
signals and character codes. 

8 Claims, 8 Drawing Sheets 
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a word in continuous speech and determine a sentence 
CHARACTER VOICE COMMUNICATION style, 

SYSTEM In this manner, through the use of a technique to 

separate the speech information into spectrum envelope 
This application is a continuation of application Ser. 5 information and the fine spectrum information and re- 
No. 857,990, filed May 1, 1986, now abandoned. combine them, three types of systems, namely of high 

efficiency voice coding transmission, speech synthesis 
BACKGROUND OF THE INVENTION oy ^ ^ voice typewriter can be organically com- 

With the development of digitization of a communi- bined. 

cation line and character input/output technique such 10 Thus, when the personal characteristic or nuance 

as word processing, realization of communication appa- included in the voice is to be transmitted, the high effi- 

ratus which allows mutual conversion between the ciency voice coding system is used, and when the voice 

characters and voices has been demanded. One ap- input is to be represented by characters or when a sen- 

proach thereto is described in Japanese Patent Publica- <ence represented by characters is to be voiced or to be 

tion No. 59-19358 entitled "Voice Transmission Sys- 15 transmitted in the form of character, the character code 

tern" coinvented by one of inventors of the present traiismission function is used, 

invention. In the disclosed system, a telex machine is BRIEF DESCRIPTION OF THE DRAWINGS 

combined with a voice typewriter and speech synthesis „ . , ,. - . ^. ^ 

by rule. However, it is a strong demand* the voice , n . ™ Q - la a block diagram of a commuiuc8t.oa system 

transmission to communicate the personal tone of a 20 "l"**"* wth ^V™"? mv J ab0 *. t . 
speaker. In the disclosed system, it ^difficult to realize ™- 2 f 10 ™ * configuration of an embodiment of 

the character communication- On the other hand, with "p^showTan^rJwdiment which integrates high 

the development of the word processmg technique, a J^^^ Sd^mhand a speech £cognZ> 

system which uses a word processor as a commumca- 7 *^ ^ 

^terminal and an integrated voice date terminal £ spe ech recognition unit, 

(IVDT> which combmes atelephone witrr the comma- mQ 5 shows an^^e^T which integrates the 

location terminal have been proposed. However ai- ^ yoice ^ ^ ^ a h ^ 

though the voice and character data are incorporated in UI ^T 

one terminal, mformation thereof is independently nan- 3Q mQS 6md7 shQW configurations for speech syn- 

dled and organic coupling of the information is not thesis, and 

attained. FIGS. 8 and 9 show coding unit and decoding unit of 

SUMMARY OF THE INVENTION high efficiency voice coding unit 

It is an object of the present invention to provide a 35 DESCRIPTION OF THE PREFERRED 

communication system which organically combines EMBODIMENTS 

voice data communication with character data commu- pjQ t ^ a functional block diagram of a termi- 

nication. nal in which a word processor function and a teletex 

In order to achieve the above object, in accordance function are combined with high efficiency voice cod- 
with the present invention, a voice word processing 40 m g transmission, speech synthesis by rule and voice 
system having a speech-synthesis by rule with a voice typewriter. Transmission apparatus need not be limited 
typewriter and a high efficiency speech coding system to a teletex network but other apparatus may be used, 
(speech information compressed transmission) are or- jh e functional operations are first explained. When 
ganically integrated, where a speech analysis unit and a the terminal shown in FIG. 1 functions as a voice corn- 
speech synthesis unit are shared. 45 pression transmission terminal, a speech input 101 is 

More specifically, in the high efficiency speech cod- separated to spectrum envelope information and fine 
ing transmission system, speech information is separated spectrum information by a speech analyzer 102, the 
into spectrum envelope information and fine spectrum information is compressed by an encoder 103 and con- 
information, and each of them is appropriately compres- verted to transmission codes 104 and sent out to a trans- 
sion-encoded. The spectrum envelope information has so mission line 105 through a line interface. The received 
linguistic information (phonological mformation), and information is synthesized into a speech waveform by a 
the fine spectrum information has accent (pitch accent . speech synthesizer 107 through a decoder 106 and out- 
or stress accent) and intonation of the voice and per- putted as a voice (speech output) 108. If the compressed 
sonal information of the speaker. information is temporarily stored in a memory 109-1, it 

In the speech-synthesis by rule, it is necessary to 55 functions as a voice mail 
synthesize accent and intonation as well as phoneme When the terminal shown in FIG. 1 is used as a voice 
information in order for a character string to be con- typewriter 110, the speech is recognized by the spec- 
verted to a voice with a high quality. For example, it is trum information and converted to a Kana (character) 
necessary that the synthesis unit use a system which can code string 111. The encoder 103 may be omitted and 
independently combine the linguistic information (pho- 60 the output of the speech analyzer 102 may be directly 
nological information) and the accent and intonation used. In this step, the converted Kana (character) code 
such as "desert" and "desert". On the other hand, the string can be handled as a signal of the same level as that 
voice typewriter is primarily designed to extract the of a key-entered Kana (character) code sequence from a 
linguistic or phonological information from the speech keyboard 112. Accordingly, functions of the word pro- 
and convert it to the character information and it is 65 cessor such as Kana (character>Kanji (chinese-charac- 
necessary to use analysis method which eliminates per- ter) conversion can be used. The Kana-Kanji converted 
sonal characteristic as much as possible. The accent and ' data may be displayed on a display (11*, 115) or trans- 
intonation information may be auxiliary used to delimit mitted as character information by using the teletex 
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function 105. A mail function which uses the character 
information may be provided. 

It is frequently troublesome to look through a large 
amount of character code document information on a 
display. When important information is to be visually 5 
checked or a chart is to be observed, they may be dis- 
played on the display 114 but much sentence informa- 
tion may be in many cases listened by voice. In this case, 
the character information string is converted to the 
spectrum envelope information and the fine spectrum 10 
information, and they are converted to voice wave- 
forms by the speech synthesis unit (decoder for speech 
compression transmission) 107 and can be outputted 
(108) as voice. 

Within the terminal, broken lines are connected be- 
cause it is necessary at times to use the terminal as a 
voice memory or word processor. 

In this manner, an economical construction of appa- 
ratus is attained by sharing various processing functions 2 q 
102 and 107 to thereby organically convert characters 
to voice or vice versa. 

FIG. 2 shows a configuration of one embodiment of 
the present invention. 

Functions of major unit are explained first In the 25 
present system, necessary functions are attained by or- 
ganic combination of those units. 

A speech analyzer 102 analyzes speech input and 
comprises an A/D converter 202, a memory 203 for 
temporarily buffering the speech input and a digital 30 
signal processor (DSP) 204 for processing signals. The 
DSP 204 extracts the spectrum envelope information by 
a speech input spectrum analyzer (by linear prediction) 
204-1, extracts the fine spectrum information by a pre- 
dictive residual extractor 204-2 and extracts a pitch 35 
(204-3). 

The speech input 101 is digitized by an A/D con- 
verter 202 and it is sent to an input buffer 203 which is 
of dual buffer structure which can hold the next speech 
input without interruption during coding of a predeter- 40 
mined length of speech. 

A vector quantizer 208 comprises a vector code book 
208-1 which contains various tables and a matching unit 
208-2 which compares an input data with the tables to 
output a code of a matching table. An item to be quan- 45 
tized is determined by selecting a necessary code book 
by an instruction from a main control unit 201. 

A recognizer 213 comprises a template memory 213-2 
and a dynamic programming (DP) matching unit 213-1. 5Q 
The recognizer 213 is used for matching a pattern hav- 
ing a time structure. 

A speech synthesizer unit 107 synthesizes voice from 
codes which are received by a receiver 206-2 of a line 
interface 206 as a high efficiency voice coding transmis- ^ 
sion code or a code sequence produced to convert char- 
acters to voice by a synthesis by rule program of the 
processor 201. 

The codes are separated to speech spectrum informa- 
tion and voice source information by a decoder 205 and 50 
they are stored in designated areas of a buffer 207 of the 
speech synthesizer 107. The data is converted to a filter 
control signal of a synthesis filter 211 and an input signal 
by a spectrum envelope decoder 209 and a fine spec- 
trum decoder 210 and they are supplied to the synthesis 65 
filter 211. They are synthesized to a speech by the syn- 
thesis filter 211, converted to an analog signal by a D/A 
converter 212 which produces an output 108. 



4 

Procedures for attaining the functions shown in FIG. 
2 "by the arrangement of FIG. 2 are explained in further 
detail 

For the high efficiency voice coding transmission, the 
speech input 101 is analyzed into the spectrum envelope 
information (linear prediction parameters) by the spec- 
trum envelope analyzer 204-1 which carries out the 
linear prediction analysis, timed in the buffer memory 
203 and supplied to the fine spectrum analyzer 204-2 
(linear prediction inverse filter). The spectrum envelope 
information is quantized by the vector quantizer 208 
and it is sent to the transmitter 206-1. The output of the 
fine spectrum analyzer 204-2 is also quantized by the 
vector quantizer 208 and it is sent to the transmitter 
206-1 where it is merged to the quantized spectrum 
envelope information and transmitted. 

For the voice typewriter function, the spectrum en- 
velope information is converted to a character sequence 
candidate by the voice typewriter recognizer 213 and it 
is sent to the processor 201 where it is used as an input 
to the word processor function of the processor 201. 

A character code sequence may be directly entered 
from the keyboard 112 without entering the speech 
information. The process and the result of the word 
processing may be displayed (115) on the display 114 as 
required. The prepared text data is stored in the mem- 
ory 109. When it is to be transmitted to other terminal as 
the character data, it is sent from the processor 201 to 
the communication line 105 (teletex network) through 
the transmitter 206-1. 

The processing of the data sent from other terminal is 
now explained. 

It is not known whether the data sent from another 
terminal is voice compressed data or character code 
data. Because the subsequent processing differs depend- 
ing on the type of data, it is necessary to discriminate 
the data. The compressed transmission data discrimi- 
nated by predetermined processing is decoded into the 
synthesis parameters by the spectrum envelope decoder 

209 and the fine spectrum decoder 210, and they are 
synthesized into the speech waveform by the speech 
synthesizer 211 and outputted as the synthesized speech 
108. 

When the text data in the memory 109 is to be output- 
ted by voice, it is converted to speech synthesis parame- 
ters by the synthesis by rule program of the processor 
201 and sent to the speech synthesis parameter buffer 
207 and converted to the synthesized speech 108 by the 
speech synthesizer 211 through the decoders 209 and 
210. The speech synthesis parameter buffer 207 func- 
tions to keep the real time of the synthesizer and absorb 
time variation of the synthesis by rule parameter gener- 
ation. It may be arranged between the decoders 209 and 

210 and the synthesizer 211. 

The character data sent from another terminal is 
displayed (115) on the display 114 through the proces- 
sor 201. 

When the terminal is to be used as a mail, the charac- 
ter data or voice data are held in the memory 109 for a 
desired time period. 

An embodiment in which one speech analyzer is used 
for both the high efficiency coding transmission and the 
speech recognition is explained. 

In the past, the speech analysis of the high efficiency 
voice coding unit and the speech analysis of the speech 
recognition unit (which is used for voice typewriting 
function to convert the voice to a character string, the 
character string may be transmitted, and for entering 
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control codes for the terminal) have been independently it is sent out to the line 105 through the transmitter 
developed, or a portion of the linear prediction tech- 206-1 and the line interface 302. 
nique developed for the former was modified for use for The voice coding unit 300 will be explained later. An 
the latter, and the condition of analysis or the formats of embodiment of the speech recognition unit 110 is shown 
resulting information are different, or only a portion of 5 in FIG. 4. 

information is utilized. Thus, they cannot be used for FIG. 4 shows a block diagram of the speech recogni- 
both analyses and the resulting information is not fully tion unit 110. The encoded speech signal 301 encoded 
utilized by both analyses. by the voice coding unit 103 is decomposed into codes 

In the present invention, in order to allow sharing of by an encoder 401 (which uses the function of 208 of 
the speech analyzer by both units, the high efficiency 1° FIG. 2 although it is not essential). Pitch information is 
voice coding output is corrected by using knowledge of sent to a pitch corrector 402 and other information is 
voice and matrftinfl to a difference pattern. The output sent to a matching unit 403 and a formant selector 406. 
of the speech analyzer of the high efficiency voice cod* In the pitch extraction method of the present embodi* 
ing system includes the spectrum envelope information ment, the pitch information is extracted from those 
(for example, linear prediction coefficient or PARCOR 13 having pitch range specified by using the spectrum 
coefficient), the fine spectrum information (sound information to be described later. Accordingly, the 
source waveform information) (for example, prediction pitch information is extracted more stably than in a 
residual waveform), power of sound source waveform, conventional pitch extraction method. However, since 
pitch frequency or period of sound source (including misextraction may occur by an environmental noise, the 
presence or absence of periodicity). They are compared 20 extracted pitch information is compared with preceding 
with the vector code books so that they are encoded to ^ succeeding pitch information by the pitch corrector 
the vector codes. The information is encoded by the 402 and if discontinuity which does not occur phoneti- 
high efficiency voice coding system (to be described external insertion is made based on the immedi- 

later) and then it is transmitted. „ c ately preceding pitch information, A simplest correc- 

The speech recognizer determines forma and pitch 25 tion b to substitute by the immediately preceding pitch 
information based on the output information. This is information. The pitch information thus corrected is 
very effective to improve performance of phones rec- * * S f? eCtor 403 Cm J 600 *"™ ? 13 ° f 

ognition. It has been widely known that the format FIG. 2) and a niatedimg corrector 408 (processed by the 
value and the variation thereof in time are very impor- M P rocess ° r 201 °vf^ 2 \ _ # ... 

tant information to determine the phones. It has also 30 ™ e unit of the present embodi- 

been known from the ^Z^^ the -jo^ 

Sve^o^ speech recognition syLn for an un.pecm ^d speaker 

vKcUo^ff 35 Kt^^ 

b) and there is very little case where they are directly 35 ff 2^2^^ 

u*d for the recognition. In the present invention by ^^S^K^ 
taking the advantage of a vector quantization method ; „ 

Z °, . , j. . . ... n - ... . - input pitch information, one or more template sets are 

for analyas and coding a plurahtyof candidates for the ^ ^ ^ teCQ ^ n performance and 

formant frequency and the pitch frequency correspond- «, reduce ^ of mBtcb ^ processing. The tem- 

mg to the vector code are extracted^ represented m ^ m ^ fl £ determine m aver . 

^ Ji^^?** that extraction tune is saved and unstability of ^ value 0 f the input pitch information to detect an 

eX S aC ^S?^ 8a \u _ - r * r . - , average tone of the speaker. The average pitch ?, is 

By utilizing the spectrum information, format infor- riven bv 

mation and pitch information, the recognition ability is 45 

significantly improved over prior art systems which use j tta a J t _ t +(i - a y/ t (i) 
only the spectrum information. 

An embodiment which integrates the high efficiency where ft is the input pitch frequency and a is a time 

speech coding system of the present invention and the constant smaller than 1, which is used to determine a 

speech recognition system is explained. 50 range for averaging effectively. 

FIG. 3 illustrates processing of a communication The template memory 404 (213-2 in FIG. 2) contains 
terminal which has a high efficiency speech coding unit the templates in a form of time serial spectrum code, 
and a speech recognition unit. It is shown by blocks to Instantaneous distance is calculated by referring the 
facilitate understanding of the functions. The speech speech input spectrum code and a distance table 410 
input 101 is encoded by the high efficiency speech en- 55 (213 in FIG. 2), and the input pattern is continuously 
coding unit 300 and the encoded speech signal 301 from compared with the templates by a continuous dynamic 
the unit 300 is sent out to the line 105 through an en- programming (DP) matching method and candidates 
coded speech interface 302 when it is to be transmitted, are produced. The continuous DP matching method 
and applied to the speech recognition unit 110 and also may be a known method such as that disclosed in "Con- 
stored in the memory 109 when it is to be used as a 60 ceptual System Design of a Continuous Speech Recog- 
speecb recognition input to the data terminal. When the nition LSI" by A. Ichikawa et al, Proceedings of 
recognition result is to be checked, the content of the ICASSP 81, 1981. 

memory 109 is transferred to the high frequency voice The formant selector 406 is constructed by the soft- 
decoding unit 106. Because it is high efficiency en- ware in the main processor 201 and takes out a plurality 
coded, the memory capacity required may be small. 65 of candidates for first to third formant frequencies from 
The recognition result is sent to the word processor 113 the formant table by using the input spectrum code as a 
where it is handled in the same manner as a normal key. It is usually difficult to precisely analyze and ex- 
keyed-in data, and when it is to be transmitted as data, tract the formant value on real time. In the present 
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system, the formant value corresponding to the spec- An embodiment thereof is explained below. Various 
trum code is precisely determined and it is registered in proposals have been made for a unit which estimates 
the table. However, since the spectrum may be tempo- pronounciation or accent from an input character string 
rarily disturbed by environmental noise, the second and and it does not constitute an essential part of the present 
third formant n^'daftx are prepared in the formant 5 invention. Accordingly, the explanation thereof is omit- 
table and a most appropriate one is selected by taking ted. In the following description, it is therefore assumed 
the continuity into account For example, a predicted that a pitch frequency pattern for intonation due to the 
formant value f n j is given from the formant table 407 as pronounciation sequence or accent information has 

already been generated. In the present embodiment, the 
f^=aiF«,,_i+02.F v -2 . 10 HVQR system is based on LPC system parameters. In 

the present embodiment, the spectrum parameter is 
where is an n-th order formant value corresponding vector-quantized based on the LPC coefficient or PAR- 
to the input spectrum code and ai and ai are experimen- COR coefficient, and the sound source information 
tarily determined prediction coefficients. The candidate includes residual waveform, pitch frequency and resid- 
which is closest to ¥ n j is selected as F^r- If the candi- 15 ual amplitude of the sound source waveform in a coded 
date is spaced from ¥ n j by more than a predetermined form. When other coding parameters are combined 
distance, it is considered that it is due to the disturbance with the synthesis by rule of the present invention, the 
by the noise and ¥ n j is selected as F^f. In this manner, parameters which fit thereto are selected, 
continuous and stable formant frequency is produced. Referring to FIG. 5, a code received by a receiver 
Depending on whether the pitch information is periodic 20 521 (206-2 in FIG, 2) is separated by the decoder 205 to 
or nonperiodic, the control is selected to produce the a spectrum information code 523 and an excitation in- 
accurate formant frequency. formation code 524 and they are sent to buffers 531 and 
Each template in the template memory 404 contains 532, respectively. The spectrum information vector 
not only the Hmp sequence of spectrum code but also code is supplied to an excitation selector 533 and a 
information on whether the pitch frequency is rising or 25 speech synthesizer 538, and the excitation information 
falling and whether the n-th formant is rising or falling. code is further separated to residual waveform vector 
The matching corrector 408 detects the output of the code, pitch period code and residual amplitude code, 
pitch corrector 402 and/or the formant selector 406 and The residual waveform vector code is supplied to the 
the matching of those information to correct the output excitation (residual waveform) selector 533, the pitch 
of the matching unit 405. It is constructed by the soft- 30 period code is supplied to the excitation selector 533 
ware in the main processor 201. For example, the cor- and an excitation reproducer 535, and the residual am- 
rected matching value D' is given by plitude code is supplied to the excitation reproducer 

535. 

2X= WrWpD The excitation selector 533 selects the excitation (re- 

35 sidual) waveform to be used for the synthesis from a 
where D is a distance and W/> and W>are factors of the excitation vector code book 534 based on the spectrum 
pitch and formant The matching value D' is set to 1.5 vector code, residual waveform vector code and pitch 
when Wj>and Wf are of opposite polarities and 1.0 in period code, and sends it to the excitation reproducer 
other cases. (When the matching degree is given not by 535. The excitation reproducer 535 converts the se- 
the distance but by correlation or analogy, the 40 lected excitation waveform to a repetitive waveform by 
weighting is opposite. The weighting differs depending using the pitch period code, corrects the waveform 
on the nature of measurement) amplitude by the residual amplitude code and repro- 

The corrected matching values are compared by a duces a series of excitation waveforms, which are sent 
selector 409 so that a correct recognition result is ob- to the speech synthesizer 211. 
tained. 45 A spectrum information reproducer 538 reads out the 

In accordance with the present invention, the speech spectrum information to be used from the spectrum 
analysis and the encoding which are common to the vector code book 537 based on the spectrum vector 
high efficiency voice coding system are attained. Thus, code and sets it into the synthesis filter 211 which reads 
in the terminal having both functions, the analysis unit in the reproduced excitation waveform from the excita- 
and the encoding unit may be common and a compact 50 tion waveform reproducer 535 to synthesize the speech, 
and economic apparatus can be provided. which is produced as a synthesized/reproduced wave- 

An embodiment in which speech synthesis apparatus form 108 through the D/A converter 212. 
for reproducing voice from speech data and speech The synthesis by rule unit 501 is now explained in 
synthesis, apparatus for synthesizing voice from charac- connection with synthesis of a Japanese word. This 
ter data are common is now explained. 55 processing is carried out by the synthesis by rule pro- 

In the past, the high efficiency transmission synthe- gram in the main processor 201 of FIG. 2. Other lan- 
sizer which utilizes the speech output and the synthe- guage can be similarly processed by properly selecting 
sizer for synthesis by rule for synthesizing desired voice a synthesis unit and language processing method, 
have been independently developed, or the synthesizer The input character code sequence is converted to a 
developed for the former is used as it is for the latter as 60 pronounciation code sequence by a synthesis by rule 
is done in the well-known PARCOR system. The analy- linguistic processor 511 and it is time-segmented for 
zer which can be used for both purposes and provide assignment to accent and intonation. Specific proce- 
high quality of speech output has not been developed. dures thereof are different from language to language 

In the present invention, the above object is achieved and various procedures have been proposed for certain 
by providing apparatus for generating a code sequence 65 languages including Japanese and English. Since the 
from an input character string, in which the code se- procedure itself is not an essential part of the present 
quence is necessary for hierarchy vector quantization invention, it is not explained here. Based on the time 
by residual (HVQR) (to be described later) system. segmentation and the intonation and accent determined 
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by the linguistic processor, the intonation pattern, par- mit distinction of a speaker. Approaches to resolve this 
ticularly a pitch period pattern is generated by a pitch problem have been proposed by multipulse method "A 
pattern generator 312. The generation procedure there- New Model of LPC Excitation for Producing Naturai- 
for can be realized by the generation model proposed by Sounding Speech at Low Bit Rates" by B. S. Atal et al, 
Fujisaki ("Analysis of Voice Fundamental Frequency 5 Proc. ICASSP 82 S5. 10, 1982 and thinned-out method 
Contours for Declarative Sentenses of Japanese" by H. M A Speech Coding Method Using Thinned-out Resid- 
Fujisaki et al, J. Acoustic, Soc. Jpn (E) S, 4 (1984) p uaT by A. Ichikawa et al, Proc. ICASSP 85, 25.7, 
233). 1985). In order to secure the desired quality of sound, 

The linguistic information and pitch pattern informa- information quantity of higher than a predetermined 
tion thus produced are sent to a synthesis code genera- 10 quantity (approximately 8 K bps) is necessary, and it is 
tor 513. Inputs to the synthesis code generator 513 in- difficult to compress the speech data to 2-2.4 K bps 
elude the spectrum envelope information, pitch infer- which is adopted in the international data line, 
mation and amplitude code necessary for the speech Other method for largely compressing the speech 
synthesis. The output thereof may be represented in the information is a vector quantization method (for exam- 
same form as the high efficiency coding system code. 15 pie, "Segment Quantization for Very-Low-Rate Speech 
By preparing a data table which is used for the synthesis Coding" by S. Roucos et al, Proc. ICASSP 82, p 1563). 
by rule, the synthesis unit can be shared as will be ex- This method handles the data of lower than 1 Kbpsand 
plained later. lacks clearness of vocal sound. A combination of the 

In FIG. 6, in order to synthesize a Japanese word multi-pulse method and the vector quantization has also 
"ohayoo" (good morning), the synthesis units are "o", 20 been studied, but since the excitation information for 
"ha", "yo" and "o" in accordance with syllables of the determining the fine spectrum requires substantial 
Japanese word, and they are time-segmented. In FIG. 6, amount of information even after it has been vector- 
an abscissa represents a time (t) and an ordinate repre- coded, it is difficult under the present circumstance to 
seats a pitch frequency fo (Hz). When the synthesis transmit the speech signal having the quality of higher 
code generator 513 receives the information shown in 25 than 10 K bps with the information quantity of 2 K bps. 
FIG. 6, it sequentially reads out codes having most Since the speech is generated by a mouth having a 
closely matching characteristic to the input information physical restriction, physical characteristic thereof vary 
from the synthesis by rule code dictionary, and sends depending on the mouth. In the vector quantization 
them to a speech synthesis buffer 515 in the same form method, a range of the speech is segmented, symbols are 
as the code of the high efficiency coding system. In 30 assigned to the sections, and the speech is transmitted 
order to simplify the explanation, the range of the pitch by the symbols. In the LPC method, the speech is di- 
frequency is divided into three mutually overlapping vided into the spectrum envelope information and the 
regions as shown in FIG. & (The actual number of fine spectrum information and they are encoded and 
regions is larger depending on the quality of speech transmitted. In the receiving station, they are combined 
required.) 35 to reproduce the speech. It permits efficient compres- 

FIG. 7 shows a construction of the synthesis by rule sion of speech information and has been widely used, 
code dictionary. Synthesis code sequences an, a 12, — The spectrum envelope information is generally suit- 
etc. can be argued by using synthesis units "a", "i", — able to vector quantization. On the other hand, the fine 
and the pitch period regions ®, (2), ® as keys. Each spectrum information is close to white noise in charac- 
synthesis code is recorded as a code sequence of a maxi- 40 teristic and it is considered as the white noise and vec- 
mum anticipated length n (n X 10 ms when control inter- tor-coded for transmission. (For example, "A Stochas- 
val is 10 ms) for each control interval of the speech tic Model of Excitation Source for Linear Prediction 
synthesizer. Each code consists of an excitation ampli- Speech Analysis-Synthesis" by G. Oyama et al, Pro. 
tude code A, a spectrum vector code P and ah excita- ICASSP 85, 25-2, 1985). The difficulty in compressing 
tion waveform vector code W. In FIG. 6, if the first 45 the information has been described above. (If the pro- 
synthesis unit "o" of the Japanese word "ohayoo" has a posal by G. Oyama is converted to the information 
length of 120 ms, the pitch range belongs to (S) and 03,1, quantity, it is anticipated that only the fine spectrum 
03^ — Q3 (12 (120/10= 12) are read from the line ® , for information needs approximately 1 1 .2 K bps.) 
"0" of the synthesis by rule code dictionary of FIG. 7 In the present system, it has been noticed that the 
and they are sent to the speech synthesis buffer. The 50 envelope information and the fine spectrum information 
pitch code corresponding to (3) and the corresponding have a strong correlation therebetween, and the above 
value in FIG. 6 is also sent out Those codes are edited problem is resolved by compressing the information by 
such that mutual positional relationship thereof is equal using the correlation. 

to that of the high efficiency voice coding system. In It has been well known that the spectrum envelope 
the present system, the excitation amplitude information 55 information an the pitch frequency have a correlation 
is not selected directly from the synthesis by rule code therebetween. For example, a male has a larger body 
dictionary but it may be modified by the synthesis code than a female and has a larger mouth for generating a 
generator 513. voice. Accordingly, a formant frequency (resonance 

A high efficiency voice coding system suitable to a frequency of the mouth) of the male, which is the spec- 
voice communication system in which the speech syn- 60 trum envelope information, is usually lower than that of 
thesis unit, and the speech analysis unit are common the female. On the other hand, the pitch frequency of 
with the speech recognition and speech synthesis by the voice of the male is lower than that of the female, 
rule respectively is now explained. This has been experimentarily proved. (For example, 

The PARCOR system and the LSP system have been "Oral perception Sense and Speech" edited by Miura, p 
well known as the high efficiency voice coding system 65 355, published by Association of Electronics and Elec- 
for less than 10 K bps and they have been practically trical Communication of Japan, 1980.) 
used. However, the quality thereof is not sufficiently It has also been known that there is a high correlation 
high to allow transmission of fine tone in order to per- between the pitch frequency and the excitation ampli- 
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tude. (For example, "Generation of Pitch Quanta by The pitch extractor 806 may be constructed by well- 
Amplitude Information" by Suzuki et al, p 647, Papers known AMDF method or auto-correlation method, 
of Japan Acoustic Association, May 1980). The present The pitch selector 807 fetches the pitch range desig- 
system provides a new system for compressing the in- nated by the spectrum vector code from the pitch range 
formation by utilizing such correlations* 5 data memory 80S, selects a pitch frequency from the 

The speech to be transmitted is converted to a vector pitch candidates produced by the pitch extractor 806 by 
symbol sequence by vector-quantizing the spectrum the software of the control unit 201 (FIG. 2), and sends 
envelope information. Then, the fine spectrum informa- it to the code editor/transmittor 813 and a residual 
tion is extracted only from vectors of those fine spec- waveform vector code selector 810. 
trum information which have high correlation to the 10 The residual waveform extractor 809 comprises a 
symbols. Thus, a range of the fine spectrum vector is conventional linear prediction type inverse filter, and it 
specified by the spectrum envelope vector instead of fetches the spectrum information corresponding to the 
selecting the vector from the entire possible range of the code selected by the spectrum vector code selector, 
fine spectrum vectors, and they among the specified from the spectrum vector code book and sets it into the 
vector the fine spectrum vector is specified, so that the 15 inverse filter, and receives the corresponding input 
information quantity can be significantly reduced. In speech waveform stored in the buffer 203 to extract the 
the fine spectrum information, the information can be residual waveform. The spectrum information pro- 
compressed by hierarchaily coding the information by duced by ^ spectrum extractor 803 may be used in this 
utilizing the correlations between the pitch frequency, step The extracted residual waveform is sent to the 
and the excitation amplitude and the residual excitation 20 residua i waveform vector code selector 810 and a resid- 
waveform. ual amplitude extractor 812. The residual amplitude 

A specific embodiment of the present system is ex- extractor 812 produces the average output of the resid- 
plained with reference to FIGS. 8 and 9. ^ waveform md sends it to the residual waveform 

In the present system, the spectrum envelope infor- vector code ^^j. 810 md ±e code editor/transmit- 
mation is the linear prediction coefficient and the fine 25 tef ^ 

spectrum iiiformation is the prediction residual wave- ^ * residual ^form vector code selector 810 fet- 
form, although the present system is not limited to the ches ^ candidate residual waveform vector from a 

a ^°2 m r?!?. 0n \u ^ *>u u- ur™ residual waveform vector code book 811 based on the 

FIG.8dlustra^ spectrum vector code and the pitch frequency, and 

voice cc^mgumt The ^V^^ 30 Compares it with the residual waveform sent from the 

dure correspond to fce elements of FIG. 2 as follows. ? waveform extractor 809 to determine the most 

The speech mput 101, A/D converter 202 and buffer "rr r. T - „ , - tn 

203 are common to both figures. A spectrum extractor matching residual waveform »^ J* 

803, a pitch extractor 806 and a residual waveform compare those, the "^^^^SS^ 

extractor 809 of FIG. 8 correspond to the spectrum 35 mformation is normalized. The 

analyzer 204-1, predictive residual analyzer 2M-2 and «™ vector code 13 ^ to ^ code editor/transmitter 

the pitch extractor 204-3 in the DSP 204 of FIG. 2. The 81 fv . . . 

r^ocessing in a residual amplitude extractor 812 is car- The editor/transmitter 813 edits die sr^ctrum 

ried out by the software in the DSP 204. The process- vector code, residual waveform vector code, piteh per- 

ings of a spectrum vector code book 804 and a spectrum 40 iod code and residual amphtude code and sends them as 

vector selector 805, a pitch range data memory 808 and the encoded speech signal 301. This processing is car- 

a pitch selector 407, and a residual waveform vector riedj rat by the transmitter 206-1 of the hne mterface 206 

code book 411 and a residual waveform code selector of ^9* f w ^ ^ x . . , , - 

410 correspond to the processings of the vector code Referring to FIG. 9, the procedure of the high effi- 

book 208-2 and the matching unit 208-1 of the vector 45 ciency voice decoder is explained, 

quantizer 208 of FIG. 2. The processing steps of the ^ FIG. 9, the code sent from a transrmssion hne 914 

elements are controlled by a program in the processor * "*ewed by a received code demultiplexer 915 which 

of the control unit 201. demultiplexes it to spectrum vector code, residual 

The processing steps of FIG. 8 are explained below. waveform vector code, -pitch period code and residual 

In FIG. 8, the speech input 101 is digitized by the 50 amplitude code. 

A/D converter 202 and it is sent to the input buffer 203. The spectrum vector code is sent to a residual wave- 

The buffer 203 is of two-side structure so that it can form code vector selector 916 and a speech synthesizer 

hold the next speech input without interruption during 519, the residual waveform vector code is sent to a 

the encoding of the current input speech. The speech residual waveform code vector selector 916, the pitch 

signal in the buffer is fetched for each section and sent 55 period code is sent to the residual waveform code vec- 

to the spectrum vector code selector 805, pitch extrac- tor selector 916 and a residuai waveform reproducer 

tor 806, and residual waveform extractor 809. 918, and the residual amplitude code is sent to the resid- 

The spectrum vector code selector 805 makes the ual waveform reproducer 918. 

linear prediction analysis in a well-known method and The residual code vector selector 916 selects the 

sequentially compares the resulting prediction coeffici- 60 residual waveform from the residual vector code book 

ent to the spectrum information in the spectrum vector 917 based on the spectrum vector code, residual vector 

code book 804 to select the spectrum having a highest code and pitch period code, and sends it to the residual 

likelihood. This step can be carried out by a conven- waveform reproducer 918. The residual waveform re- 

tional speech recognition unit producer 918 converts the selected residual vector code 

The selected spectrum vector code is sent to the pitch 65 to a repetitive waveform by using the pitch period code, 

selector 807 and the code editor/transmittor 813, and corrects the amplitude by the residual amplitude code 

the corresponding spectrum information is sent to the and reproduces a series of residual waveform, which is 

residual waveform extractor 809. sent to the speech synthesizer 919. 
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The speech synthesizer 919 reads oat the spectrum mation and converting a result of said recognition 
vector to be used from the spectrum vector code book into character code strings; 
920 based on the spectrum vector code, sets it into the (3) keyboard means for inputting characters and con- 
internal synthesis filter, and receives the reproduced verting said characters into character code strings; 
residual vector code to synthesize the speech. The 5 (4) reception and transmission means for receiving 
speech synthesis filter may be a conventional LPC type and transmitting said encoded spectrum envelope 
speech synthesis filter for RELF. information and said fine spectrum information and 

The synthesized speech waveform is converted by either said character code strings from said speech 

the D/A converter 921 to an analog signal to reproduce recognition means or said character code strings 

a speech signal 922. 10 from said keyboard means; 

By registering a tone signal in the spectrum vector ( 5 ) voice decoding means including decoding means 

code book, a signal other than speech can be transmit- for decoding said encoded spectrum envelope in- 

te ^ formation and said encoded fine spectrum informa- 

In accordance with the present system, very high tion bv «*J »eplkm ^ transniission 

quality of speech can be encoded with small informa- 15 meaBS » text-to-speech rule means for converting 

don quantity said character code strings from said speech recog- 

Since the processing in the receiving unit when the nition means or said keyboard means into spectrum 

character code has been transmitted is different from ™ y *°V* nifornmtion and fine spectrum informa- 

that when the speech signal has been high efficiency ^ tion in accordance with a predetermined rule, and 

^ , . _„ 01 L7^ ;^rT.<^ocoo«, frt zZ nem u fhom 20 speech synthesis means for synthesizing a speech 

£££^^ ^^^^^^^ 

following Lnner. ^ the follows de^ription, the ^^^^t^^l 

teletex network is used as the tran^n network. ^iZina^n^fine spectrum information from 

In the teletex, not all of the codes correspond to said text- tc-speech rule means, 

characters but certain codes are not used. These ^codes % A ^ communication system ac- 

are used as control codes for speech codes. In FIG. 2, a tQ claim t wherein ^ reception ^ tram>niis- 

command to trananit the speech signal is issued by the ^ m ^ns includes means for distinctively transmitting 

processor 201 (which also functions as the confer)* md receivin ^ information and said character code 

the transmitter 206-1. in the line interface unit 206. The J() st ^ a&9 

transmitter 206-1 adds the control code and the number 3 A character and voice communication system ao 

of codes (for example, 1024 words) to be used for the cording to claim 1, wherein said speech synthesis means 

transniission of the speech signal to the head of the 0 f ^ decoding means includes speech synthesis 

codes and transmits the high efficiency coded speech rule means for converting said character code strings to 

codes by the number equal to said number of codes. 35 a speech signal. 

After the designated number of codes have been trans- ^ ^ character and voice communication system ac- 
mitted, the transmitter returns to the character code cording to claim 1 wherein said speech analysis means 
tra nsmissi on mode* When the speech signal is to be 0 f said voice encoding means inclvrie? means for sepa- 
continuously transmitted, the above operation is re- rating w** speech signal into spectrum envelope infor- 
peated. The receiver 206-2 of the interface unit 206 is 40 mation and fine spectrum information, said voice encod- 
normally in the character code reception mode. If the m g means further includes vector quantization means 
received code is the speech transmission code, the code f or producing code information to classify the spec- 
to be used for the subsequent speech transmission is trams envelope information into a limited number of 
decoded and it is assumed that the speech codes have patterns and means for encoding the fine spectrum in- 
been received by the number of codes. It is reported to 45 formation, wherein said means for encoding the fine 
the processor 201 and the received data is written into spectrum information is controlled by the code informa- 
the synthesizer 107 or the memory 109 at the address tion produced by said vector quantization means, 
assigned for the voice mail. After the designated num- 5, a character and voice communication system ac- 
ber of codes have been received, the receiver returns to cording to claim 4 wherein said means for encoding the 
the character code reception mode. Other transmission 50 fine spectrum information controls a range of pitch 
control is same as that of the teletex. This arrangement variation and type of excitation waveform and a range 
permits teletex communication by the standard teletex or excitation wave form amplitude using the code infor- 
terminal. mation produced by said vector quantization means. 
We claim: 6. A character and voice communication system ac- 
1. Character and voice communication system com- 55 cording to claim 1 wherein said voice decoding means 
prising: includes spectrum envelope decoding means and fine 

(1) voice encoding means including means for receiv- spectrum fine decoding means. 

ing a speech signal, speech analysis means for ana- 7. A character and voice communication system ac- 

lyzing the speech signal to produce spectrum enve- cording to claim 1 further comprising means for synthe- 
lope information and fine spectrum information 60 sizing a speech signal using the output from said speech 

and encoding means for encoding said spectrum recognition means. 

envelope information and said fine spectrum infor- 8. A character and voice communication system ac- 
tuation, said speech analysis means being used for cording to claim 1 further comprising means for dis- 
both speech transmission and speech recognition; playing the signal converted by said speech recognition 

(2) speech recognition means for recognizing said 65 means. 

speech signal using said spectrum envelope infor- 00000 
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